Goto

Collaborating Authors

 latent class


A Bayesian latent class reinforcement learning framework to capture adaptive, feedback-driven travel behaviour

Sfeir, Georges, Hess, Stephane, Hancock, Thomas O., Rodrigues, Filipe, Rad, Jamal Amani, Bliemer, Michiel, Beck, Matthew, Khan, Fayyaz

arXiv.org Machine Learning

Many travel decisions involve a degree of experience formation, where individuals learn their preferences over time. At the same time, there is extensive scope for heterogeneity across individual travellers, both in their underlying preferences and in how these evolve. The present paper puts forward a Latent Class Reinforcement Learning (LCRL) model that allows analysts to capture both of these phenomena. We apply the model to a driving simulator dataset and estimate the parameters through Variational Bayes. We identify three distinct classes of individuals that differ markedly in how they adapt their preferences: the first displays context-dependent preferences with context-specific exploitative tendencies; the second follows a persistent exploitative strategy regardless of context; and the third engages in an exploratory strategy combined with context-specific preferences.


Revisiting Theory of Contrastive Learning for Domain Generalization

Alvandi, Ali, Rezaei, Mina

arXiv.org Machine Learning

Contrastive learning is among the most popular and powerful approaches for self-supervised representation learning, where the goal is to map semantically similar samples close together while separating dissimilar ones in the latent space. Existing theoretical methods assume that downstream task classes are drawn from the same latent class distribution used during the pretraining phase. However, in real-world settings, downstream tasks may not only exhibit distributional shifts within the same label space but also introduce new or broader label spaces, leading to domain generalization challenges. In this work, we introduce novel generalization bounds that explicitly account for both types of mismatch: domain shift and domain generalization. Specifically, we analyze scenarios where downstream tasks either (i) draw classes from the same latent class space but with shifted distributions, or (ii) involve new label spaces beyond those seen during pretraining. Our analysis reveals how the performance of contrastively learned representations depends on the statistical discrepancy between pretraining and downstream distributions. This extended perspective allows us to derive provable guarantees on the performance of learned representations on average classification tasks involving class distributions outside the pretraining latent class set.








Enhancing Phenotype Discovery in Electronic Health Records through Prior Knowledge-Guided Unsupervised Learning

Mayer, Melanie, Lactaoen, Kimberly, Weissman, Gary E., Himes, Blanca E., Hubbard, Rebecca A.

arXiv.org Machine Learning

Objectives: Unsupervised learning with electronic health record (EHR) data has shown promise for phenotype discovery, but approaches typically disregard existing clinical information, limiting interpretability. We operationalize a Bayesian latent class framework for phenotyping that incorporates domain-specific knowledge to improve clinical meaningfulness of EHR-derived phenotypes and illustrate its utility by identifying an asthma sub-phenotype informed by features of Type 2 (T2) inflammation. Materials and methods: We illustrate a framework for incorporating clinical knowledge into a Bayesian latent class model via informative priors to guide unsupervised clustering toward clinically relevant subgroups. This approach models missingness, accounting for potential missing-not-at-random patterns, and provides patient-level probabilities for phenotype assignment with uncertainty. Using reusable and flexible code, we applied the model to a large asthma EHR cohort, specifying informative priors for T2 inflammation-related features and weakly informative priors for other clinical variables, allowing the data to inform posterior distributions. Results and Conclusion: Using encounter data from January 2017 to February 2024 for 44,642 adult asthma patients, we found a bimodal posterior distribution of phenotype assignment, indicating clear class separation. The T2 inflammation-informed class (38.7%) was characterized by elevated eosinophil levels and allergy markers, plus high healthcare utilization and medication use, despite weakly informative priors on the latter variables. These patterns suggest an "uncontrolled T2-high" sub-phenotype. This demonstrates how our Bayesian latent class modeling approach supports hypothesis generation and cohort identification in EHR-based studies of heterogeneous diseases without well-established phenotype definitions.